首页> 外文OA文献 >Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores
【2h】

Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores

机译:异构多核上的群集间线程到核心映射和DVFs

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Heterogeneous multi-core platforms that contain different types of cores, organized as clusters, are emerging, e.g. ARM’s big.LITTLE architecture. These platforms often need to deal with multiple applications, having different performance requirements, executing concurrently. This leads to generation of varying and mixed workloads (e.g. compute and memory intensive) due to resource sharing. Run-time management is required for adapting to such performance requirements and workload variabilities and to achieve energy efficiency. Moreover, the management becomes challenging when the applications are multi-threaded and the heterogeneity needs to be exploited. The existing run-time management approaches do not efficiently exploit cores situated in different clusters simultaneously (referred to as inter-cluster exploitation) and DVFS potential of cores, which is the aim of this paper. Such exploitation might help to satisfy the performance requirement while achieving energy savings at the same time. Therefore, in this paper, we propose a run-time management approach that first selects thread-to-core mapping based on the performance requirements and resource availability. Then, it applies online adaptation by adjusting the voltage-frequency (V-f) levels to achieve energy optimization, without trading-off application performance. For thread-to-core mapping, offline profiled results are used, which contain performance and energy characteristics of applications when executed on the heterogeneous platform by using different types of cores in various possible combinations. For an application, thread-to-core mapping process defines the number of used cores and their type, which are situated in different clusters. The online adaptation process classifies the inherent workload characteristics of concurrently executing applications, incurring a lower overhead than existing learning-based approaches as demonstrated in this paper. The classification of workload is performed using the metric Memory Reads Per Instruction (MRPI). The adaptation process pro-actively selects an appropriate V-f pair for a predicted workload. Subsequently, it monitors the workload prediction error and performance loss, quantified by instructions per second (IPS), and adjusts the chosen V-f to compensate. We validate the proposed run-time management approach on a hardware platform, the Odroid-XU3, with various combinations of multi-threaded applications from PARSEC and SPLASH benchmarks. Results show an average improvement in energy efficiency up to 33% compared to existing approaches while meeting the performance requirements.
机译:新兴的包含不同类型内核的异构多核平台被组织为集群。 ARM的big.LITTLE架构。这些平台通常需要处理多个具有不同性能要求的应用程序,这些应用程序可以同时执行。由于资源共享,这导致生成变化的和混合的工作负载(例如,计算和内存密集型)。需要运行时管理来适应这样的性能要求和工作负载变化并实现能源效率。此外,当应用程序是多线程的并且需要利用异构性时,管理变得充满挑战。现有的运行时管理方法不能有效地同时利用位于不同集群中的核心(称为集群间利用)和核心的DVFS潜力,这是本文的目的。这种开发可能有助于满足性能要求,同时实现节能。因此,在本文中,我们提出了一种运行时管理方法,该方法首先根据性能要求和资源可用性选择线程到核心的映射。然后,它通过调整电压-频率(V-f)电平来实现在线自适应,以实现能源优化,而无需权衡应用程序的性能。对于线程到内核的映射,使用脱机概要分析的结果,该结果包含在异构平台上通过使用各种可能的组合使用不同类型的内核在应用程序上执行时的性能和能量特征。对于应用程序,线程到内核的映射过程定义了位于不同群集中的已使用内核的数量及其类型。在线适应过程对并发执行的应用程序固有的工作负载特征进行分类,比本文中现有的基于学习的方法产生的开销更低。工作负载的分类是使用“每条指令的内存读取数”(MRPI)进行的。适应过程针对预测的工作量主动选择合适的V-f对。随后,它监视工作负载预测错误和性能损失(通过每秒指令数(IPS)进行量化),并调整选择的V-f进行补偿。我们使用PARSEC和SPLASH基准测试中的多线程应用程序的各种组合,在硬件平台Odroid-XU3上验证了建议的运行时管理方法。结果表明,在满足性能要求的同时,与现有方法相比,能源效率平均提高了33%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号